Add Distillation API to Keras #21572

divyashreepathihalli · 2025-08-11T23:40:21Z

This PR adds Knowledge Distillation API to Keras,

Key Features

Core Components

Distiller: Main distillation model that combines teacher and student models
Strategies: Pluggable distillation strategies (LogitsDistillation, FeatureDistillation, MultiOutputDistillation)

Usage Examples

Basic Knowledge Distillation

import keras
from keras.distillation import Distiller, LogitsDistillation

# Create models
teacher = keras.Sequential([...])  # Large, pre-trained model
student = keras.Sequential([...])  # Smaller model to train

# Set up distillation
distiller = Distiller(
    teacher=teacher,
    student=student,
    strategies=[LogitsDistillation(temperature=3.0)],
    alpha=...
)

# Standard Keras workflow
distiller.compile(optimizer='adam', loss='sparse_categorical_crossentropy')
distiller.fit(x_train, y_train, epochs=10)
predictions = distiller.predict(x_test)

gemini-code-assist

Summary of Changes

Hello @divyashreepathihalli, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new Knowledge Distillation API to Keras, designed to facilitate the efficient transfer of learned knowledge from larger, pre-trained "teacher" models to smaller "student" models. The API seamlessly integrates with Keras's existing training, evaluation, and prediction workflows, providing a flexible and extensible framework for various distillation techniques.

Highlights

New Distiller Model: A core Distiller class is added, which is a keras.Model subclass, enabling the combination and training of teacher and student models within the standard Keras workflow.
Pluggable Distillation Strategies: Introduces a BaseDistillationStrategy and three concrete implementations: LogitsDistillation (for softening logits), FeatureDistillation (for intermediate feature matching), and MultiOutputDistillation (for handling models with multiple outputs).
Configurable Loss Balancing: The Distiller allows specifying an alpha parameter to balance the contribution of the student's original loss and the distillation loss.
Automatic Teacher Freezing: The teacher model is automatically set to non-trainable (trainable=False) during the distillation process to prevent its weights from being updated.
Comprehensive Testing: New test files (distiller_test.py and strategies_test.py) are added to ensure the robustness and correctness of the new API, covering initialization, loss computation, and end-to-end workflows.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a comprehensive Knowledge Distillation API to Keras, which is a great addition. The implementation is well-structured with a Distiller model and pluggable strategies. My review has identified a few issues: a critical issue with the FeatureDistillation strategy which is not fully implemented, a high-severity issue with an unused temperature parameter in the Distiller class that could mislead users, and a medium-severity issue regarding a simplistic fallback for loss calculation in multi-output scenarios. Addressing these points will improve the robustness and clarity of this new API.

keras/src/distillation/strategies.py

keras/src/distillation/distiller.py

gemini-code-assist · 2025-08-11T23:41:38Z

keras/src/distillation/distiller.py

+                if isinstance(y_pred, (list, tuple)):
+                    # For multi-output models, use the first output for student
+                    # loss
+                    # This is a simplified approach for compatibility
+                    if isinstance(y, (list, tuple)):
+                        student_loss = self.student_loss_fn(y[0], y_pred[0])
+                    else:
+                        student_loss = self.student_loss_fn(y, y_pred[0])


The fallback logic for calculating the student loss in _compute_loss for multi-output models is overly simplistic as it always defaults to using the first output (y_pred[0]). This might not align with user expectations for all multi-output scenarios and could lead to incorrect training behavior if model.compile() is not called with a loss that properly handles multiple outputs.

While the primary path using self.compiled_loss is correct, this fallback could be made more robust. Consider raising a more specific error if a multi-output model is used without a compiled loss, or clarifying this behavior more explicitly in the documentation.

divyashreepathihalli · 2025-08-12T00:04:57Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a well-structured Knowledge Distillation API to Keras, a valuable addition for model compression and knowledge transfer. The implementation of the Distiller model and pluggable strategies is clean and the code is generally of high quality, with good documentation and examples.

I have identified a few significant issues that should be addressed to improve the robustness and completeness of the API:

A critical flaw in FeatureDistillation limits its use to sequential models, which will prevent it from working with more complex architectures.
The Distiller's loss computation for multi-output models has a bug in its fallback logic.
The Distiller model cannot be correctly serialized and deserialized due to an incomplete get_config implementation.
The tests for FeatureDistillation are not comprehensive enough to cover non-sequential model architectures.

Addressing these points will significantly enhance the reliability and usability of this new API. Overall, this is a great contribution.

keras/src/distillation/strategies.py

keras/src/distillation/distiller.py

keras/src/distillation/strategies_test.py

gemini-code-assist

Code Review

This pull request introduces a comprehensive Knowledge Distillation API to Keras, which is a great addition. The implementation is well-structured with a Distiller model and pluggable DistillationStrategy classes. The code is generally clean and includes a good set of tests.

I've identified a few key areas for improvement:

Serialization: The Distiller and MultiOutputDistillation classes are not correctly configured for serialization, which will prevent models using them from being saved and loaded. This is a critical feature in Keras that needs to be addressed.
Feature Extraction Robustness: The FeatureDistillation strategy uses a method for extracting intermediate layer features that is not robust and will fail for models with non-sequential architectures (e.g., ResNets). This limitation needs to be documented and ideally improved.
Code Simplification: There's a small piece of unreachable code in the Distiller's loss computation that can be simplified.

Addressing these points will significantly improve the robustness and usability of this new API.

keras/src/distillation/distiller.py

keras/src/distillation/strategies.py

gemini-code-assist · 2025-08-12T00:07:53Z

keras/src/distillation/distiller.py

+                if isinstance(y_pred, list) and len(y_pred) > 0:
+                    # For multi-output, use first output for student loss
+                    student_loss = self.student_loss_fn(y[0], y_pred[0])
+                else:
+                    student_loss = self.student_loss_fn(y, y_pred)


The isinstance(y_pred, list) check on line 283 is redundant because y_pred is converted to a list on line 263. This makes the else block on line 286 unreachable. The logic can be simplified to directly use the first output for the student loss calculation.

# Fallback: use student_loss_fn directly # For multi-output, use first output for student loss student_loss = self.student_loss_fn(y[0], y_pred[0])

divyashreepathihalli · 2025-08-12T00:27:35Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a comprehensive and well-designed Knowledge Distillation API to Keras. The implementation is robust, featuring a flexible Distiller class and a set of pluggable distillation strategies that cover common use cases like logits and feature distillation, as well as multi-output models. The code is accompanied by extensive and thorough tests, which is excellent. My feedback includes a couple of suggestions to improve code style in the API files and to enhance the robustness of a test case by removing a broad exception handler. Overall, this is a high-quality contribution that will be a valuable addition to Keras.

keras/src/distillation/distiller_test.py

keras/api/_tf_keras/keras/distillation/__init__.py

keras/api/distillation/__init__.py

codecov-commenter · 2025-08-12T00:47:42Z

Codecov Report

❌ Patch coverage is 71.09635% with 87 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.65%. Comparing base (47fcb39) to head (584d225).

Files with missing lines	Patch %	Lines
keras/src/distillation/distillation_loss.py	61.81%	29 Missing and 13 partials ⚠️
keras/src/distillation/distiller.py	77.34%	20 Missing and 21 partials ⚠️
keras/api/_tf_keras/keras/distillation/__init__.py	0.00%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #21572      +/-   ##
==========================================
- Coverage   82.69%   82.65%   -0.05%     
==========================================
  Files         573      577       +4     
  Lines       58888    59189     +301     
  Branches     9218     9277      +59     
==========================================
+ Hits        48696    48921     +225     
- Misses       7845     7891      +46     
- Partials     2347     2377      +30

Flag	Coverage Δ
keras	`82.45% <71.09%> (-0.04%)`	⬇️
keras-jax	`63.30% <71.09%> (+0.06%)`	⬆️
keras-numpy	`57.53% <20.93%> (-0.19%)`	⬇️
keras-openvino	`34.33% <20.93%> (-0.07%)`	⬇️
keras-tensorflow	`64.07% <71.09%> (+0.05%)`	⬆️
keras-torch	`63.62% <71.09%> (+0.05%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

fchollet

Thanks for the PR! Some quick comments on the API.

keras/src/distillation/distiller.py

hertschuh · 2025-10-09T23:43:05Z

keras/src/distillation/distiller.py

+                        # Re-raise with context about which strategy failed
+                        raise RuntimeError(
+                            f"Failed to extract features for "
+                            f"FeatureDistillation targeting teacher layer "


nitpick {type(strategy)} instead of FeatureDistillation in case people subclass it.

hertschuh · 2025-10-09T23:45:55Z

keras/src/distillation/distiller.py

+            # Ensure student_loss is a scalar
+            if hasattr(student_loss, "shape") and len(student_loss.shape) > 0:
+                student_loss = keras.ops.mean(student_loss)


The strategy loss raises an error line 504 if it's not scalar. Any reason to handle these differently?

The student loss fn returns per-sample losses by default (shape, (batch_size,)) - the code converts to scalars with keras.ops.mean

But with stategy.compute_loss() method, it is supposed to return scalar losses.

keras/src/distillation/distillation_loss_test.py

hertschuh · 2025-10-09T23:49:31Z

keras/src/distillation/distillation_loss_test.py

+        )
+
+        # Verify that teacher and student outputs have the same structure
+        keras.tree.assert_same_structure(teacher_features, student_features)


This test is not using FeatureDistillation at all. You should instantiate a FeatureDistillation and call the validation method instead.

keras/src/distillation/distillation_loss_test.py

divyashreepathihalli · 2025-10-21T18:07:12Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a new Knowledge Distillation API to Keras, which is a valuable addition. The API design is clean and follows the end-to-end workflow principle from the Keras design guidelines. The core components like Distiller, LogitsDistillation, and FeatureDistillation are well-structured.

My review focuses on a few key areas:

Correctness: I found a high-severity issue in the FeatureDistillation loss calculation for cosine similarity which would lead to incorrect training behavior.
Maintainability: There are opportunities to make the code more robust, for example by using isinstance for type checking instead of string matching on names.
Style Guide Adherence: I've pointed out several minor violations of the Keras API design guidelines regarding docstring formatting, specifically the naming of Args: and Examples: sections.

The implementation is solid overall, with good validation and efficient feature extraction for multi-strategy distillation. The accompanying tests are also quite comprehensive. Addressing the feedback will improve the correctness and maintainability of this new API.

keras/src/distillation/distillation_loss.py

keras/src/distillation/distiller.py

keras/src/distillation/distiller_test.py

fchollet · 2025-10-22T18:41:25Z

keras/src/distillation/distiller.py

+            The teacher model is frozen during distillation.
+        student: A `keras.Model` to be trained through distillation.
+        strategies: List of distillation strategies to apply. Can be a single
+            strategy or a list of strategies like `LogitsDistillation`,


"strategy instances like keras.distillation.LogitsDistillation()..."

There is an inconsistency in terminology, where these are sometimes referred to as losses and sometimes as strategies. Which is it? (Could be both: training_strategy or something)

We should make the arg name here consistent with the object class name.

If these are DistillationLoss subclasses then the arg should be distillation_losses

A potential issue with calling them losses is that they're very different from keras.losses.

done! renamed everything to distillation loss

keras/src/distillation/distiller.py

fchollet

Thanks for the PR! Main request from me is to fix the naming inconsistency -- pick a standard class name for distillation losses / strategies and then always refer to instances of the class by that name, e.g. distillation_losses. No strong opinion on what the class name should be.

divyashreepathihalli · 2025-10-23T00:10:42Z

I have addressed all teh comments. Updated the name to be consistent - distillation loss everywhere instead of strategy. I will merge the PR

hertschuh

The bulk replace did some weird stuff, in particular in the docstrings.

But also, the main question is whether the Distiller argument should be called distillation_losses (plural.).

hertschuh · 2025-10-22T23:23:58Z

keras/src/distillation/distillation_loss.py

            student_outputs: Outputs from the student model. Can be a single
                tensor or a list/tuple of tensors for multi-output models.
-            **kwargs: Additional arguments for custom strategies.
+            **kwargs: Additional arguments for custom distillation_loss.


distillation losses.

hertschuh · 2025-10-22T23:24:15Z

keras/src/distillation/distillation_loss.py

        Raises:
-            ValueError: If models are not compatible with this strategy.
+            ValueError: If models are not compatible with this
+                distillation_loss.


distillation loss.

hertschuh · 2025-10-22T23:24:44Z

keras/src/distillation/distillation_loss.py

 @keras_export("keras.distillation.FeatureDistillation")
 class FeatureDistillation(DistillationLoss):
-    """Feature distillation strategy using intermediate layer representations.
+    """Feature distillation distillation_loss.


Feature distillation loss.

hertschuh · 2025-10-22T23:25:07Z

keras/src/distillation/distillation_loss.py

-    This strategy applies temperature scaling to the teacher's logits before
-    computing the loss between teacher and student predictions. It's the most
-    common approach for knowledge distillation.
+    This distillation_loss applies temperature scaling to the teacher's logits


This distillation loss...

hertschuh · 2025-10-22T23:25:53Z

keras/src/distillation/distillation_loss_test.py

 @pytest.mark.requires_trainable_backend
 class TestLogitsDistillation(TestCase):
-    """Test cases for LogitsDistillation strategy."""
+    """Test cases for LogitsDistillation distillation_loss."""


Remove "distillation_loss" (actually, you can remove the whole line, it's not needed).

hertschuh · 2025-10-23T00:37:15Z

keras/src/distillation/distiller.py

-        for strategy in self.strategies:
+        for strategy in self.distillation_loss:


for distillation_loss in self.distillation_losses:

hertschuh · 2025-10-23T00:37:41Z

keras/src/distillation/distiller.py

+                Arguments:
+                    model: The model to create an extractor for.
+                    layer_names: List of layer names to extract features from.

-        Raises:
-            ValueError: If model has no symbolic inputs/outputs.
+                Returns:
+                    Feature extractor model or `None` if no layer names
+                        sprovided.
+        `
+                Raises:
+                    ValueError: If model has no symbolic inputs/outputs.


Indentation seems off.

hertschuh · 2025-10-23T00:38:12Z

keras/src/distillation/distiller.py

+            for strategy, weight in zip(
+                self.distillation_loss, self.distillation_loss_weights
+            ):


for distillation_loss, weight in zip(self.distillation_losses...

hertschuh · 2025-10-23T00:38:28Z

keras/src/distillation/distiller.py

                    self.student
                ),
-                "strategies": [
+                "distillation_loss": [


distillation_losses?

hertschuh · 2025-10-23T00:38:42Z

keras/src/distillation/distiller.py

            config["student"]
        )
-        config["strategies"] = [
+        config["distillation_loss"] = [


distillation_losses?

divyashreepathihalli added 6 commits August 11, 2025 20:59

initial code dump

137a37f

clean up the implementation of the distillation api

8b37482

code reformat

8252b8f

final clean up

9bdec23

pre commit

6efecee

Merge branch 'keras-team:master' into distillation-api

36930e8

google-ml-butler bot added the size:XL label Aug 11, 2025

google-ml-butler bot assigned gbaned Aug 11, 2025

gemini-code-assist bot reviewed Aug 11, 2025

View reviewed changes

divyashreepathihalli requested a review from hertschuh August 11, 2025 23:40

google-ml-butler bot added the awaiting review label Aug 11, 2025

gemini-code-assist bot reviewed Aug 11, 2025

View reviewed changes

divyashreepathihalli marked this pull request as draft August 11, 2025 23:44

address gemini review comments

1f73a69

gemini-code-assist bot reviewed Aug 12, 2025

View reviewed changes

keras/src/distillation/strategies.py Outdated Show resolved Hide resolved

keras/src/distillation/distiller.py Outdated Show resolved Hide resolved

keras/src/distillation/distiller.py Show resolved Hide resolved

keras/src/distillation/strategies_test.py Outdated Show resolved Hide resolved

gemini-code-assist bot reviewed Aug 12, 2025

View reviewed changes

divyashreepathihalli removed the request for review from hertschuh August 12, 2025 00:11

divyashreepathihalli added 2 commits August 11, 2025 17:21

address gemini review comments

88c2468

add a way to save trained student model

9de5809

gemini-code-assist bot reviewed Aug 12, 2025

View reviewed changes

keras/src/distillation/distiller_test.py Outdated Show resolved Hide resolved

keras/api/_tf_keras/keras/distillation/__init__.py Outdated Show resolved Hide resolved

keras/api/distillation/__init__.py Outdated Show resolved Hide resolved

divyashreepathihalli added 2 commits August 11, 2025 17:40

disable tests in numpy and openvino backends

b954718

pre commit

bf6219a

fchollet reviewed Aug 13, 2025

View reviewed changes

divyashreepathihalli added 4 commits August 15, 2025 15:17

address comments

b7e51a9

address comments

e8229c2

run pre-commit

387595a

update distiller and strategies

4d6610a

divyashreepathihalli added the kokoro:force-run label Oct 7, 2025

kokoro-team removed the kokoro:force-run label Oct 7, 2025

hertschuh reviewed Oct 9, 2025

View reviewed changes

divyashreepathihalli added 2 commits October 15, 2025 15:35

address review comments

7184fcd

code reformat

d6febe9

divyashreepathihalli requested a review from hertschuh October 15, 2025 22:44

divyashreepathihalli added the kokoro:force-run label Oct 15, 2025

kokoro-team removed the kokoro:force-run label Oct 15, 2025

gemini-code-assist bot reviewed Oct 21, 2025

View reviewed changes

divyashreepathihalli added 3 commits October 21, 2025 18:35

address gemini review comments

1298d1f

Merge branch 'keras-team:master' into distillation-api

a23dc52

address gemini comments

607dcd8

divyashreepathihalli added the kokoro:force-run label Oct 21, 2025

kokoro-team removed the kokoro:force-run label Oct 21, 2025

hertschuh approved these changes Oct 22, 2025

View reviewed changes

keras/src/distillation/distiller_test.py Outdated Show resolved Hide resolved

google-ml-butler bot added kokoro:force-run ready to pull Ready to be merged into the codebase labels Oct 22, 2025

kokoro-team removed the kokoro:force-run label Oct 22, 2025

fchollet reviewed Oct 22, 2025

View reviewed changes

keras/src/distillation/distiller.py Outdated Show resolved Hide resolved

fchollet reviewed Oct 22, 2025

View reviewed changes

address review comments

584d225

google-ml-butler bot removed the ready to pull Ready to be merged into the codebase label Oct 22, 2025

divyashreepathihalli added the kokoro:force-run label Oct 22, 2025

kokoro-team removed the kokoro:force-run label Oct 22, 2025

divyashreepathihalli merged commit 465a56d into keras-team:master Oct 23, 2025
11 checks passed

google-ml-butler bot removed the awaiting review label Oct 23, 2025

hertschuh requested changes Oct 23, 2025

View reviewed changes

		for strategy in self.strategies:
		for strategy in self.distillation_loss:

Add Distillation API to Keras #21572

Add Distillation API to Keras #21572

Conversation

divyashreepathihalli commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Features

Core Components

Usage Examples

Basic Knowledge Distillation

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

divyashreepathihalli commented Aug 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

divyashreepathihalli commented Aug 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

fchollet left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

divyashreepathihalli commented Aug 11, 2025 •

edited

Loading

codecov-commenter commented Aug 12, 2025 •

edited

Loading

fchollet Oct 22, 2025 •

edited

Loading